Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

230 ◾ Bioinformatics

and install R. You can also download and install RStudio by following the instructions

at “https://www.rstudio.com/products/rstudio/download/”. Once you have R installed on

your computer, run R, and on R prompt, run the following to install the Bioconductor

packages required for the remaining ChIP-Seq data analysis:

if (!require(“BiocManager”, quietly = TRUE))

install.packages(“BiocManager”)

BiocManager::install(“clusterProfiler”)

BiocManager::install(“ChIPseeker”)

BiocManager::install(“TxDb.Hsapiens.UCSC.hg19.knownGene”)

BiocManager::install(“EnsDb.Hsapiens.v75”)

BiocManager::install(“clusterProfiler”)

BiocManager::install(“AnnotationDbi”)

BiocManager::install(“org.Hs.eg.db”)

nstall.packages(“dplyr”)

After installing the packages, load them as follows:

library(clusterProfiler)

library(ChIPseeker)

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

library(EnsDb.Hsapiens.v75)

library(AnnotationDbi)

library(org.Hs.eg.db)

library(“dplyr”)

Now, we are ready to finish the remaining analysis in R. Copy “chip1_peaks.narrowPeak”,

“chip2_peaks.narrowPeak”, and “chip3_peaks.narrowPeak” files produced by MACS3

above to a directory that you can browse from inside R. Use R to choose your working

directory where those three files are copied. In the following, we can use ChIPseeker[9]

package functions to create different plots.

6.3.6.1 ChIP-Seq Peaks’ Coverage Plot

The “covplot()” function is used to create a plot that shows peak distribution over the

whole genome. This function calculates the coverage of peak regions over chromosomes

or regions of chromosomes and generates a peaks’ coverage plot. The function requires

the peak data as Granges (genomic ranges) object (see ChIPseeker documentations). The

following codes load each file as a data frame, add column names, create a Granges object,

and finally create the ChIP-Seq peaks’ coverage plots for each sample:

peaks1 <- read.table(“chip1_peaks.narrowPeak”,header=FALSE)

colnames <- c(“chrom”, “start”, “end”, “name”, “score”,”strand”,

“signal”, “pvalue”, “qvalue”, “peak”)

colnames(peaks1) <- colnames